Search CORE

63 research outputs found

What does validation of cases in electronic record databases mean? The potential contribution of free text

Author: Cassell Jackie A
Koeling Rob
Nicholson Amanda
Tate Anne Rosemary
Publication venue: 'Wiley'
Publication date: 01/03/2011
Field of study

Electronic health records are increasingly used for research. The definition of cases or endpoints often relies on the use of coded diagnostic data, using a pre-selected group of codes. Validation of these cases, as ‘true’ cases of the disease, is crucial. There are, however, ambiguities in what is meant by validation in the context of electronic records. Validation usually implies comparison of a definition against a gold standard of diagnosis and the ability to identify false negatives (‘true’ cases which were not detected) as well as false positives (detected cases which did not have the condition). We argue that two separate concepts of validation are often conflated in existing studies. Firstly, whether the GP thought the patient was suffering from a particular condition (which we term confirmation or internal validation) and secondly, whether the patient really had the condition (external validation). Few studies have the ability to detect false negatives who have not received a diagnostic code. Natural language processing is likely to open up the use of free text within the electronic record which will facilitate both the validation of the coded diagnosis and searching for false negatives

Crossref

PubMed Central

Lancaster E-Prints

Sussex Research Online

Recommended from our members

Exploring practical approaches to maximising data quality in electronic healthcare records in the primary care setting and associated benefits

Author: Dungey Sheena
Glew Simon
Heyes Barbara
MacLeod John
Tate A Rosemary
Publication venue
Publication date: 09/07/2014
Field of study

Exploiting the information contained within electronic healthcare records (EHR) data will be key to addressing major challenges to public health both nationally and globally, ultimately offering a means of maximising efficiency and equality in care. There are, however, significant challenges in using EHRs effectively and particularly in ensuring the quality of data recorded. Incorrect or missing data could render records as useless or indeed misleading such that conclusions drawn from the data could have a negative impact. Amongst other difficulties, recording data can be time consuming to the extent of conflicting with the GP’s primary focus of patient consultation in an already time-constrained environment. Understanding the requirements of and the demands upon GPs must be central to addressing the issue of data quality (DQ) within EHRs. As part of on-going work into DQ at the Clinical Practice Research Datalink (CPRD) and in collaboration with the University of Sussex (UoS), a workshop session was held at the SAPC (Society for Academic Primary Care) conference in 2014 with the aim of exploring issues of DQ in primary care EHRs from the perspective of different users of GP data and with particular focus on how and why data is recorded in the first instance. The intended outcome was a furthered understanding of both the challenges and the direct benefits to GPs of ensuring high quality data with a view to establishing a workable approach to recording data and maximising benefits to all users of EHRs

Sussex Research Online

Data quality in European primary care research databases. Report of a workshop held in London September 2013

Author: Beloff Natalia
Boggon Rachael
Kalra Dipak
Puri Shivani
Tate A Rosemary
Williams Tim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Primary care research databases provide a significant resource for health services and epidemiological research. However since data are recorded primarily for clinical care their suitability for research may vary widely according to the research application or recording practices of individual general practitioners. A methodological approach for characterising data quality is required. We describe a one-day workshop entitled “Towards a common protocol for measuring and monitoring data quality in European primary care research databases”. Researchers, database experts and clinicians were invited to give their perspectives on data quality and to exchange ideas on what data quality metrics should be made available to researchers. We report the main outcomes of this workshop, including a summary of the presentations and discussions and suggested way forward

Crossref

Sussex Research Online

A pragmatic approach for measuring data quality in primary care databases

Author: Beloff Natalia
Boggon Rachael
Dungey Sheena
Puri Shivani
Tate A Rosemary
Williams Tim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

There is currently no widely recognised methodology for undertaking data quality assessment in electronic health records used for research. In an attempt to address this, we have developed a protocol for measuring and monitoring data quality in primary care research databases, whereby practice-based data quality measures are tailored to the intended use of the data. Our approach was informed by an in-depth investigation of aspects of data quality in the Clinical Practice Research Datalink Gold database and presentations of the results to data users. Although based on a primary care database, much of our proposed approach would be equally applicable to other health care databases

Crossref

Sussex Research Online

Quality of recording of diabetes in the UK: how does the GP’s method of coding clinical data affect incidence estimates? Cross-sectional study using the CPRD database

Author: Beloff Natalia
Dungey Sheena
Glew Simon
Tate A Rosemary
Williams Rachael
Williams Tim
Publication venue: 'BMJ'
Publication date: 01/01/2017
Field of study

Objective: To assess the effect of coding quality on estimates of the incidence of diabetes in the UK between 1995 and 2014. Design: A cross-sectional analysis examining diabetes coding from 1995 to 2014 and how the choice of codes (diagnosis codes vs codes which suggest diagnosis) and quality of coding affect estimated incidence. Setting: Routine primary care data from 684 practices contributing to the UK Clinical Practice Research Datalink (data contributed from Vision (INPS) practices). Main outcome measure: Incidence rates of diabetes and how they are affected by (1) GP coding and (2) excluding ‘poor’ quality practices with at least 10% incident patients inaccurately coded between 2004 and 2014. Results: Incidence rates and accuracy of coding varied widely between practices and the trends differed according to selected category of code. If diagnosis codes were used, the incidence of type 2 increased sharply until 2004 (when the UK Quality Outcomes Framework was introduced), and then flattened off, until 2009, after which they decreased. If non-diagnosis codes were included, the numbers continued to increase until 2012. Although coding quality improved over time, 15% of the 666 practices that contributed data between 2004 and 2014 were labelled ‘poor’ quality. When these practices were dropped from the analyses, the downward trend in the incidence of type 2 after 2009 became less marked and incidence rates were higher. Conclusions: In contrast to some previous reports, diabetes incidence (based on diagnostic codes) appears not to have increased since 2004 in the UK. Choice of codes can make a significant difference to incidence estimates, as can quality of recording. Codes and data quality should be checked when assessing incidence rates using GP data

Crossref

PubMed Central

Sussex Research Online

Recommended from our members

Classification of brain tumours from MR spectra: the INTERPRET collaboration and its outcomes.

Author: Acosta Dionisio
Arús Carles
Griffiths John R
Howe Franklyn A
Julià-Sapé Margarida
Majós Carles
Postma Geert
Tate A Rosemary
Underwood Joshua
Publication venue: NMR Biomed
Publication date: 26/11/2015
Field of study

The INTERPRET project was a multicentre European collaboration, carried out from 2000 to 2002, which developed a decision-support system (DSS) for helping neuroradiologists with no experience of MRS to utilize spectroscopic data for the diagnosis and grading of human brain tumours. INTERPRET gathered a large collection of MR spectra of brain tumours and pseudo-tumoural lesions from seven centres. Consensus acquisition protocols, a standard processing pipeline and strict methods for quality control of the aquired data were put in place. Particular emphasis was placed on ensuring the diagnostic certainty of each case, for which all cases were evaluated by a clinical data validation committee. One outcome of the project is a database of 304 fully validated spectra from brain tumours, pseudotumoural lesions and normal brains, along with their associated images and clinical data, which remains available to the scientific and medical community. The second is the INTERPRET DSS, which has continued to be developed and clinically evaluated since the project ended. We also review here the results of the post-INTERPRET period. We evaluate the results of the studies with the INTERPRET database by other consortia or research groups. A summary of the clinical evaluations that have been performed on the post-INTERPRET DSS versions is also presented. Several have shown that diagnostic certainty can be improved for certain tumour types when the INTERPRET DSS is used in conjunction with conventional radiological image interpretation. About 30 papers concerned with the INTERPRET single-voxel dataset have so far been published. We discuss stengths and weaknesses of the DSS and the lessons learned. Finally we speculate on how the INTERPRET concept might be carried into the future.Funding from project MARESCAN (SAF2011-23870) from Ministerio de Economia y Competitividad in Spain. This work was also partially funded by CIBER-BBN, which is an initiative of the VI National R&D&i Plan 2008-2011, CIBER Actions and financed by the Instituto de Salud Carlos III with assistance from the European Regional Development Fund. JRG acknowledges support from Cancer Research UK, the University of Cambridge and Hutchison Whampoa Ltd.This is the author accepted manuscript. The final version is available from Wiley via http://dx.doi.org/10.1002/nbm.343

Apollo (Cambridge)

St George's Online Research Archive

Optimising use of electronic health records to describe the presentation of rheumatoid arthritis in primary care: a strategy for developing code lists

Author: A Hansell
A. Rosemary Tate
Amanda Nicholson
AR Tate
AR Tate
Christian Lovis
CJ Edwards
CJ Edwards
D Blumenthal
D Blumenthal
DA Houssien
DG Manuel
DH Lawson
E Herrett
Elizabeth Ford
Greta Rait
Helen E. Smith
IE van der Horst-Bruinsma
Irene Petersen
JA Cassell
JA Linder
Jackie Cassell
K Jordan
K Jordan
K Jordan
K Kumar
K Thiru
Kevin A. Davies
KP Liao
MC Gulliford
MG Marmot
RJ Carroll
S Dave
S DeLisle
SL Thomas
SS Pakhomov
SW Pascoe
U Arndt
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Background Research using electronic health records (EHRs) relies heavily on coded clinical data. Due to variation in coding practices, it can be difficult to aggregate the codes for a condition in order to define cases. This paper describes a methodology to develop ‘indicator markers’ found in patients with early rheumatoid arthritis (RA); these are a broader range of codes which may allow a probabilistic case definition to use in cases where no diagnostic code is yet recorded. Methods We examined EHRs of 5,843 patients in the General Practice Research Database, aged ≥30y, with a first coded diagnosis of RA between 2005 and 2008. Lists of indicator markers for RA were developed initially by panels of clinicians drawing up code-lists and then modified based on scrutiny of available data. The prevalence of indicator markers, and their temporal relationship to RA codes, was examined in patients from 3y before to 14d after recorded RA diagnosis. Findings Indicator markers were common throughout EHRs of RA patients, with 83.5% having 2 or more markers. 34% of patients received a disease-specific prescription before RA was coded; 42% had a referral to rheumatology, and 63% had a test for rheumatoid factor. 65% had at least one joint symptom or sign recorded and in 44% this was at least 6-months before recorded RA diagnosis. Conclusion Indicator markers of RA may be valuable for case definition in cases which do not yet have a diagnostic code. The clinical diagnosis of RA is likely to occur some months before it is coded, shown by markers frequently occurring ≥6 months before recorded diagnosis. It is difficult to differentiate delay in diagnosis from delay in recording. Information concealed in free text may be required for the accurate identification of patients and to assess the quality of care in general practice

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

Lancaster E-Prints

Sussex Research Online

FigShare

Do early infant feeding practices vary by maternal ethnic group?

Author: A Rosemary Tate
Gartner
Hamlyn
Lucy J Griffiths
Plewis
Renfrew
Smith
Thomas
Publication venue: 'Cambridge University Press (CUP)'
Publication date
Field of study

Crossref

Determining the date of diagnosis – is it a simple matter? The impact of different approaches to dating diagnosis on estimates of delayed care for ovarian cancer in UK primary care

Author: A Majeed
A Rosemary Tate
A Verdecchia
AB Ryerson
Alexander GR Martin
B Rachet
B Yawn
BA Goff
BA Goff
BA Goff
CR Bankhead
F Berrino
Jackie A Cassell
JMJ Kirwan
K Sikora
M Richards
ML Wynn
MP Coleman
R Horton
R Jones
R Lawrenson
Sarah R Anderson
SW Pascoe
Tarita Murray-Thomas
VL Allgar
VL Allgar
W Hamilton
W Hamilton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background Studies of cancer incidence and early management will increasingly draw on routine electronic patient records. However, data may be incomplete or inaccurate. We developed a generalisable strategy for investigating presenting symptoms and delays in diagnosis using ovarian cancer as an example. Methods The General Practice Research Database was used to investigate the time between first report of symptom and diagnosis of 344 women diagnosed with ovarian cancer between 01/06/2002 and 31/05/2008. Effects of possible inaccuracies in dating of diagnosis on the frequencies and timing of the most commonly reported symptoms were investigated using four increasingly inclusive definitions of first diagnosis/suspicion: 1. "Definite diagnosis" 2. "Ambiguous diagnosis" 3. "First treatment or complication suggesting pre-existing diagnosis", 4 "First relevant test or referral". Results The most commonly coded symptoms before a definite diagnosis of ovarian cancer, were abdominal pain (41%), urogenital problems(25%), abdominal distension (24%), constipation/change in bowel habits (23%) with 70% of cases reporting at least one of these. The median time between first reporting each of these symptoms and diagnosis was 13, 21, 9.5 and 8.5 weeks respectively. 19% had a code for definitions 2 or 3 prior to definite diagnosis and 73% a code for 4. However, the proportion with symptoms and the delays were similar for all four definitions except 4, where the median delay was 8, 8, 3, 10 and 0 weeks respectively. Conclusion Symptoms recorded in the General Practice Research Database are similar to those reported in the literature, although their frequency is lower than in studies based on self-report. Generalisable strategies for exploring the impact of recording practice on date of diagnosis in electronic patient records are recommended, and studies which date diagnoses in GP records need to present sensitivity analyses based on investigation, referral and diagnosis data. Free text information may be essential in obtaining accurate estimates of incidence, and for accurate dating of diagnoses

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Sussex Research Online

How many mailouts? Could attempts to increase the response rate in the Iraq war cohort study be counterproductive?

Author: A de Winter
A Rosemary Tate
A Stang
D Goldberg
DJ Newell
E Savoca
EB Blanchard
GY Zou
I Bross
J Brogger
J Siemiatycki
J Ware
JP Chretien
KJ Rothman
Lisa Hull
M Hotopf
Margaret Jones
Matthew Hotopf
N Kreiger
Nicola T Fear
P Edwards
R Development Core Team
RC Klesges
RJ Rona
RJ Rona
Roberto Rona
S Greenland
S Greenland
Simon Wessely
V Helasoja
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Low response and reporting errors are major concerns for survey epidemiologists. However, while nonresponse is commonly investigated, the effects of misclassification are often ignored, possibly because they are hard to quantify. We investigate both sources of bias in a recent study of the effects of deployment to the 2003 Iraq war on the health of UK military personnel, and attempt to determine whether improving response rates by multiple mailouts was associated with increased misclassification error and hence increased bias in the results. Methods Data for 17,162 UK military personnel were used to determine factors related to response and inverse probability weights were used to assess nonresponse bias. The percentages of inconsistent and missing answers to health questions from the 10,234 responders were used as measures of misclassification in a simulation of the 'true' relative risks that would have been observed if misclassification had not been present. Simulated and observed relative risks of multiple physical symptoms and post-traumatic stress disorder (PTSD) were compared across response waves (number of contact attempts). Results Age, rank, gender, ethnic group, enlistment type (regular/reservist) and contact address (military or civilian), but not fitness, were significantly related to response. Weighting for nonresponse had little effect on the relative risks. Of the respondents, 88% had responded by wave 2. Missing answers (total 3%) increased significantly (p < 0.001) between waves 1 and 4 from 2.4% to 7.3%, and the percentage with discrepant answers (total 14%) increased from 12.8% to 16.3% (p = 0.007). However, the adjusted relative risks decreased only slightly from 1.24 to 1.22 for multiple physical symptoms and from 1.12 to 1.09 for PTSD, and showed a similar pattern to those simulated. Conclusion Bias due to nonresponse appears to be small in this study, and increasing the response rates had little effect on the results. Although misclassification is difficult to assess, the results suggest that bias due to reporting errors could be greater than bias caused by nonresponse. Resources might be better spent on improving and validating the data, rather than on increasing the response rate.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

King's Research Portal

Sussex Research Online